Eulerian Path Methods for Multiple Sequence Alignment

نویسندگان

  • Michael S. Waterman
  • Yu Zhang
چکیده

With the rapid increase in the size of genome sequence databases, the multiple sequence alignment problem is increasingly important and often requires the alignment of a large number of sequences. Beginning in 1975, many heuristic algorithms have been created to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally distinct from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequence determination, that transforms all the DNA sequencing fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. This lecture focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one configuration. The main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. In a simulation, 500 sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within 3 minutes on a personal computer while the quality of alignment is satisfactory. As a result, accurately and simultaneously aligning thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance. Proceedings of the Computational Systems Bioinformatics (CSB’03) 0-7695-2000-6/03 $17.00 © 2003 IEEE

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DNA sequence assembly and multiple sequence alignment by an Eulerian path approach.

We describe an Eulerian path approach to the DNA fragment assembly that was originated by Idury and Waterman 1995, and then advanced by Pevzner et al. 2001b. This combinatorial approach bypasses the traditional “overlap-layout-consensus” approach and successfully resolved some of the troublesome repeats in practical assembly projects. The assembly results by the Eulerian path approach are accur...

متن کامل

An Eulerian Path Approach to Global Multiple Alignment for DNA Sequences

With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available...

متن کامل

An Eulerian path approach to local multiple alignment for DNA sequences.

Expensive computation in handling a large number of sequences limits the application of local multiple sequence alignment. We present an Eulerian path approach to local multiple alignment for DNA sequences. The computational time and memory usage of this approach is approximately linear to the total size of sequences analyzed; hence, it can handle thousands of sequences or millions of letters s...

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

Fast A Algorithms for Multiple Sequence Alignment

The multiple alignment of the sequences of DNA and proteins is applicable to various important elds in molecular biology. Although the approach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On the other hand, this problem corresponds to the shortest path problem and the A algorithm, which can e ciently nd the sh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003